Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning

نویسنده

  • Nishant Desai
چکیده

Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Geometric Traversal Algorithm for Reward-Uncertain MDPs

Markov decision processes (MDPs) are widely used in modeling decision making problems in stochastic environments. However, precise specification of the reward functions in MDPs is often very difficult. Recent approaches have focused on computing an optimal policy based on the minimax regret criterion for obtaining a robust policy under uncertainty in the reward function. One of the core tasks i...

متن کامل

Building Relational World Models for Reinforcement Learning

Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward state...

متن کامل

Exploring compact reinforcement-learning representations with linear regression

This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the computational and storage complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning represe...

متن کامل

Approximate Policy Iteration for Markov Control Revisited

Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API a...

متن کامل

Solving Markov Decision Processes via Simulation

This chapter presents an overview of simulation-based techniques useful for solving Markov decision problems/processes (MDPs). MDPs are problems of sequential decision-making in which decisions made in each state collectively affect the trajectory of the states visited by the system — over a time horizon of interest to the analyst. The trajectory in turn, usually, affects the performance of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017